Historic,  archived  document 

Do  not  assume  content  reflects  current 
scientific  knowledge,  policies,  or  practices. 


4 


SIZE     OF     SAMPLE  STUDY 


With  particular  reference  to  the  Pig  Survey  of  the 


Bureau  of  Agricultxiral 
Economics 


b7 


isradf  ord  B,  Smith 
& 

Mordecal  Ezekiel 


Coiapleted  October  3O,  1924 


Conterib  s. 


page 


Introduction  I 

Expericjental  data  used  1 

Theory  2 

i^lication  to  data  k 

Test  of  Application  5 

Representativeness  of  San^le— Pig  Survey  7 

Propagation  of  Error  in  Obtaining  U.S. 

Average  ••  8 

Detenaining  the  reo^uisite  numoer  of  records 

for  a  given  degree  of  accuracy   11 

Note  on  relationship  of  correlation  betv.een  x  ic  y 
in  the  saciple  aiid  the  correlation  between 
averages  of  x  and  y  in  successive 

sao^lss   13 


The  authors  wish  to  express  their  appreciation 
for  the  assistance  rendered  by  isrir.  H,  R.  Tolley 
in  the  development  and  application  of  the 
theoretical  considerations  involved. 


SIZE  OF  SAMPLE  STUDY— PIG  SUEVET. 

1)  Introduction, 

A  considerable  portion   of  economic  forecasting  is 

"based  on  the  "tnis  year  to  last  year"  type  of  ratio.  Using 
the  sesii-annual  '*Pig  Survey"  of  the  Division  of  Crop  and  Livestock 
Estiiaates,  Bureau  of  Agricultural  Economics,  as  an  exajnpls,  the 
reported  number  of  sows  farrowed  this  year  and  the  reported  number 
of  sows  farr(X/ed  last  year  are  tabulated  and  the  ratio  between 
them  taken  as  the  ratio  of  the  hog  crop  to  come  this  year  as  com- 
pared with  the  known  hog  crop    last  year.         Tne  occasion  for 
this  study  was  to  test  the  statistical  accuracy  of  this  "Sows 
farrowed  this  year  to  sows  farrowed  last  year"  ratio.    To  accosi^illsh 
this  reqixired  the  application  of  some  known  theory  and  the  develop- 
ment of  some  new.    This  p^er  accordingly  aims  Cl^  to  set  forth 
the  theoretical  considerations  in  measuring  the  accuracy  of  such 
a  year-year  ratio,  (2)  test  it    by  application  to  a  portion  of 
the  records  used  in  making  a  pig  survey,  (3)  draw  certain  conclusion 
specifically  related  to  the  pig  survey. 

2)  Eicperiaental  data  used. 

The  gOUUCcir. )  records  for  the  State  of  Iowa  used  in  the 
June,  1924  Pig  Survey  were  used  as  the  basis  for  securing  the 
necessary  statistical  constants  for  use  in  the  formulae  applicable 
to  the  pig  surveys  and  also  as  experimental  material  in  testing 
out  the  application  of  the  theory  develojied*       These  records  awuM^c 
secured  through  the  cooperation  of  the  £\ireau  of  Agricultural 


—2 

Economics  and  the  Post  Office  Department,  the  schedules  being 
filled  out  by  rural  postmen;l(  ^d  returned  to  the  Bureau/«i 

•})  Theory 

According  to  definition  the  standard  error  of  an  average,  <^ 
ratio— indeed  of  any  raeasure— is  the  standard  deviation  of  the 
frequency  curve  which  woxild  be  formed  by  an  infinite  repetition 
thereof,      That  is,  if  we  had  a  large  nouioer  of  pig  surveys  taken 
under  identical  sac^jling  conditions  and  the  corresponding  sows 
farrowed  ratio  (X^/Xg)    these  ratios  wjould  fall  on  a  fraquency  curve 
of  standard  deviation, s;      and  since  the  standard  deviation  of 
the  frequency  curve  is  the  stand^^rd  error  of  the  individual  obser- 
vation', s  would  then  be  the  standard  error  neasuring  the  precision 
of  the  single  ratio,  X^^/Xg.oi^ows  farrowed  this  Spring)/l^ow8 
farrowed  last  Spring^    Since  we  do  xaot  have  the  reo^uisite  nuuiber  of 
pig  surveys  to  give  us  the  constants,  we  must  ^proxiraate  them  in 
seme  other  manner. 

From  Yule  i/,  iowley  2J  %         i>Ierri:aan    3/  the  following 

1/  G.  Udney  Yule:     Introduction  to  the  Theory  of  Statistics,  p.  215 
Zj  Arthur  L,  Bowley:    Elements  of  Statistics,  p.  319 
j/  Mansfield  «»rriman:    Methods  of  Least  Squares,  p.  79 

formula  for  the  standard  deviation,  s,  of  a  series  of  ratios  of 
form  Xi/X2   may  be  secured: 

in  which  (Yule's  terminology)^   v^'V^^i,    i         *^^x'  ^-^^ 
means  of  X, 


-3 

Tnis  fonmila,  it  Liay  t)e  observad,  calls  for  constants 
derived  from    a  series  of  ratios — surveys — v/nile  we  have  but  one. 
The  necessary  constants  being    aaeans,  standard  deviations,  and 
correlation,  they  may  be  s^proxiaaated  as  follows: 

For  the  ratio  of  the  means  of  a  series  of  sa^iple  pig 
surveys  we  may  take  as  our  closest  arproximation  the  ratio  of  the 
single  survey  which  we  have.     (This  ratio  in  pract  ice^  i»¥©r  variedv 
beyond       -  ^'^JL    "^^^  standard  deviation  of  a  series  of  means,  is, 
of  course,  approximated  by  tne  standard  error  of  the  single  mean 
and  may  tfeerefoi-e  be  secured  by  dividing  the  standard  deviation  in 
ti)e  single  sai:5>le  by  the  square  root  of  the  number  of  items  composing 
the  saople,      Y/e  ra&.y  thus  fi^proximate    Vj    by  writing  v-^/^,v^) 
in  which is  now  the  standstrd  deviation  of  the  number  of  sows  farrowed 
this  spring,  m,    the  mean  of  the  same  series  and    n  the  number  of 
records    in  our  saiuple,  1    Similarly    "Vj^^  ^ J^^/nJ 

r^^,  the  correlation  of  a  series  of  averages,  mj  s, 
with  corresponding  averages,  m^s,  would  ordiriarily  be  taken  as 
equal  to  the  correlation  between       and  x^within  our  3ac5:le.  But 
as  shown  in  secticn  nine  of  this  paper  this  is  true  only  when  the 
relationship  between  the  variates  is  strictly  linear.    When  the 
relationship  is  c\irvilinear  a  better  measure  of  the  correlation 

between  averages  of  successive  sajijjles  is  the  correlation  index  4/ 

k/  See:  Frederick  C.  ^.^ills:    The  i\«asurement  of  Correlation  and 

the  Problem  of  Estimation.    Journal  -Am.  Stat.  Assoc.,  Vol  XIX, 
No.  147  (Sep  1924)    See  also:    Liordecai  Ezekicl:    A  iiethod  of 
Handling  Curvilinear  Correlation  for  anj'-  lumber  of  Variables, 
Journal  Am,  Stat.  Assoc.^_  VolXIX,  bio,  ikg  (Dec  1924).  


f 


As  a  matter  of    safety,  therefore,  it  is  better  to  use  the  index, 
A  plotting  of  the  original  data  revealed  a  very  distinct  curvilinear 
relationship  In  our  experimental  data.    With  last  spring  as  the 
dependent  the  correlation  ratio  was  .86  and  the  correlation  index 
coagouted  from  a  free  hajid  curve  after  the  inethod  of  Ezekiel  U/  was 
f^proximately  ,8  ^  as  coc^^ared  with  a  correlation  coefficient  of  .56  • 

With  these  considerations  in  taind  the  standard  error,  e, 
of  the  ratio  of  pigs  fi^rrcwed  this  spring  to  pigs  farrowed  last  sprir-g, 
in^ /m  ,  may  now  be  v/ritten 

or/^,  bring  n  outside  the  parentheses,  and  taios  root, 

4)  Application  to  data.  ^ 

From  the  saiaple  data  (State  of  Iowa)  considered  typical 

of  the  Com  Belt,  the  following  values  for  the  necessary  constants 

were  secured, 

I.jeans«      Standard  Deviations.  Correlation 

Coefficient,  r   =  .56 
m,  =  11.81         c7  =  IO.U5  Eatio  -  Tij^  ^^.86 

m^z  14.38  13.47  Index  •  ^.8 

Substituting  these  values  in  the  fomula  (2),  we  have 

Another  nethod  of  stating  this  relation  is  "The  percentage 
that  the  probable  error  is  of  the  ratio  is 

.<J7y?  or 
^'  =    ^  ...(h) 

Formula    ('4)  is  graphed  on  page.^  , 


Test  of  Application 

A  test  of  the  theoretical  results  was  made  as  follows: 

(a)  The  SOOO  (cir,  ^  records  were  shuffled  and  3^  sajiples 

of  approxinjately  2^0  Items  were  dealt  out.    The  follov/ing  data  for 

each  sample  was  then  recorded: 

IJaniber  In  senile,  n 

Average  nmber  of  sov/s  farrowed  this  spring,  m , 
Average  number  of  sov»rs  farrowed  last  spring,  7r^2_ 
The  ratio,  m./ui  ,  was  cOE^jViied. 

(b)  The  cards  were  re-sn\iffled  and  sirailar  data  recorded 
for  another  3^  sauries,  making  60,  N  ^sanples  in  all, 

(c)  The  follov/ing  means  and  standard  deviations  of  this 
set  of  N  sac^.les  were  then  secured. 

=  11.  SI  1^.3S  /%t^,=  .S22 

The  cort-elaticn  between  the  g.eries  of  averages ,         ^'T^^,  was  .76 
which,  it  roay  be  noted,  approximates  ^,-.8,  the  value  used  in  the 
fomulae;  the  reason  for  using  the  index  of  correlation  is  given 
in  vecticn  nine. 

(d)  For  five  of  the  samples  (selected  at  random)  the 
following  values  were  secured: 


Sanole  # 

<^ 

<^ 

0-1, 

1 

11.75 

13.56 

12.7 

16.0 

.67 

\ 

13.99 

18. 9 

15.2 

.50 

9.^9 

17.1s 

12.7 

16.7 

.57 

16.00 

12.4 

14,  g 

.50 

5 

S.og 

11.  2g 

11.0 

12.6 

.52 

Average  - 

10.60 

li+.  56 

12.3 

15.0 

.55 

It  may  be  noted  that  tlie  correlations  in  the  sauries  come 
quite  close  to  the  correlation  for  the  state,  ,51,  which  is  a  practical 
demonstration  of  the  statement  made  in  section   nine  to  the  effect 


—6 

that  correlation  in  the  saiq-le  indicates  correlation  in  the  universe, 

(e)    Should  our  hypothesis  be  correct  as  stated  on  p.  

and  demonstrated  in  section  nine,  and  should  the  data  follow  a 
normal  distribution,  we  should  be  able  to  arrive  at  the  standard 
deviati  on  of  the  60  average  ratios,  of  type  m^/m^  by  use  of  the 
fonaula  (2)  using  constants  derived  from  the  five  sao5>les  analyzed 
in  detail.    That  is,  (using  the  value  of  /^*f    for  the  state) 


X  ^  —    —  /     '    —   —   I 

The  error  so  derived,  .028,  falls  considerably  short  of 
the  coE^uted  error,  .05C.    Before  attributir^  this  discrepancy  to 
errors  in  the  theory  let  us  examine  the  data  with  a  view  to  discovering 
its  noncalcy.    If  norraal  distribution  exists,  or  for  that  me.tter 
a  skew  distribution  so  long  as  it  is  not  too  radical,  the  standard 
deviation  of  the    series  coc^rosed  of  means    should  be  approximated 
by  the  standard  error  of  the  single  average.    As  a  matter  of  fact 
the  figures  are  thus: 


This  disparity  naturally  diminishes  the  size  of  the 
namerator  in  the  formula  and  hence  reduces  the    coc^juted  error  to 
•o22.     In  this  case  with  means  of  about  the  same  size  as  the  standard 
deviations,  the  distribution  must  necessarily  be  far  from  the  normal 
uqpon  which  the  error  formulae  are  based:    for  the  lowest  limit  possible 
in  a  record  teenMNb  is  zero,  or  one  star^dard  deviation  less  than 
the  mean,  while  there  is  practically  no  limit  to  the  highest  possible 

value  in  a  record.    A  frequency  polygon  revealed  the  skewre  ss  of 


-7 

the  distribution:    son»  of  the  larger  items  were  fifteen  standard 
deviations  greater  than  the  mean. 

Since  the  aotiicated  error  was  about  one- half  the  actual, 
it  would  be  a  measure  of  safety  to  double  the  estimated  error  as 
an  approximation  of  the  probeble  accxuracy— or,  conversely,  when 
estimating  the  number  of  iteirs  to  secure  any  given  error  in  a 
pig  survey  ration,  one-half  this  given  error  should  be  used  in  the 
coc|}u  tat  ions* 

6)    Eepresentativensss  of  Sample— Pig  Survey, 

Two  criterea  of  representativeness  of  sanxple  were  considered! 

(a)    Whether  or  not  the  questionnaire  will  be  returned  is 
largely  dependent  upon  the  interest  of  the  informant.  Ifypcthetically 
the  large  scale  hog  producer — owing  to  his  progressiveness  and 
financial  ability  might  well  be  in  the  lead  in  adjustment  mcvememts. 
At  the  same  Hme  his  progressiveness  would  make  him  more  willing 
to  return  the  questionnaire  than  smaller  producers.    Thus  the  re- 
sults of  the  questionnaire  might  reveal  a  mere  progressive  situa- 
tion than  actually  existed.     If  this  type  of  bias  exists  it  may  be 
detected  by  noting  if  there  is  axsy  constant  relation  betv^^een  si^e 
of  farm  and  the  ratio  change.    About  three  hundred  records  were 
accordingly  chosen  at  random  and  the  "Sows  farrowed  this  spring  to 
sows  farrowed  last  spring"  ratio  corapvited.    The  ratio  was  then 
correlated  with  the  size  of  farm.    The  correlation  coefficient  was 
less  than    .05  sho*»ing  th£.t  there  is  practically  no  errcr  attribu- 
table to  automatic  selective  sac^jling  in  thfes  survey.    Another  way 
of  stating  the  conclusion  is:    It  makes  no  difference  whether  or  not 


— s 

records  are  taken  from  large  farms  or  froia  small.    This  ccncluslon 
shoiild  not  be  generalized  to  ether  schedule  inqixiries,  however,  for 
there  are  certain  features  in  the  collection  of  the  pig  survey, 
peculiar  to  it  alone,  and  to  which  the  validity  of  this  conclusion 
might  be  traced* 

(b)    The  second  factor  considered  with  reference  to 
representativeness  of  8ac5:,le  was  the  influence  of  geographical 
location  on  the  ratio.    Pat  as  a  question:    Dees  the  geographical 
location  have  ir^fluence  i^on  the  Sows  farrowed  this  spring  to 
sows  farrowed  last  spring  ratio?"    Tr.is  was  answered  in  the 
affirmative  as  would  be  ejq^ected,  by  coEfuting  the  ratio  for 
different  geographical  areas.    The  ratio  was  cciauted  for  each  of 
the  nine  crop  estimate  districts  of  lov/a  and  ranged  from  «79       ^•3>3  • 
The  conclusion  is  that  only  when  the  ratios  are  carefully  weighted 
with  reference  to  tie   noraber  of  sov/s  in  the  district  is  the  average 
valid. 


-9 

7)    Propagation  of  Error  In  Obtaining  U,  S.  Average. 

The  Ifeited  States  final  figure  is  obtaixied  by  securing 
a  weighted  average  of  the  states'  ratios^  for  "Sov/s  farrowed  this 
spring  to  sows  farrowed  last  spring.    The  unit  is  the  state.  The 
weights  are  intended  to  approximate  the  number  of  sows  farrowed 
in  each  state.    There  is,  therefo]re,a  possibility  of  error  in 
the  failure  of  the  selected  weights  to  conform  to  both  the  years 
fron  wnich  the  ratio  is  confuted.     This  type  of  error,  however, 
will  not  be  considered  in  this  paper. 

The  error  of  a  sum  in  terms  of  the  errors  of  its  parts, 
with  weights,  w,  is  ^1 


5/    iowley,  p.  316.    Merriman,  Chc^  VII,  Formula  102. 

Since  to  obtain  the  weighted  mean,  we  divide  the  weighted  sum 
by  the  sum  of  the  weights,  the  error  of  the  mean  would  be  -feh» 
vplue  of  roncula  (5;  divtdgd^^ ^^x^  Hence, 

-yn,  "  ~   •  •  • 

To  obtain  the  error  for  the  U.  S,  Ratio,  then,  ccrcrute 

the  errors  for  each  con?>onent  of  the  average  by  formula  (2), page    , 

From  these  errors  con5)Ute  the  error  of  the  final  average,  according 
to  the  weighting  used,  by  the  above  formula,  (^. 

This, however,  necessitates  a  considerable  amount  of 
coii?:ut.ation  in  securing  standard  deviations  and    correlation  indices, 
particularly  when  somethirig  like    100  000  records  are  involved. 

Since  the  confutation  of  errors  gives  us  at  best,  but  approximations, 


^  


-10 

predicated  upon  an  infinite  number  of  events,  a  condition  which 
actually  never  obtains,  anj'  metxiod  whereby  we  can  secure  such  an 
approximation   without  sucn  a  large  amount  of  cco^utation  would  be 
useful.      This  may  be  done  by  assviming  that  the  probable  percentage 
error  for  all  states  may  be  obtained  from  fonrula  (4).    This  is 
on  the  hypothesis  that  rho,  and  the  dispersions,  ,  will  tend 

to  remain  constant.       This  is  not,  of  course  the  exact  case,  but  it 
is  rational  to  s^ccipose  that  as  the  mean  increases--i,s,  as  we  deal 
with  larger  fanns— so  also  will  the  standard  deviation  increase; 
that  is,  the  larger  farm  is  but  a  magnification  of  the  smaller  in 
its  chief  characteristics.    This  being  so,  the  dispersion,  , 
will  tend  to  remain  constant.    The  value  of  rho    is  a  reflection 
of  the  rfi^idity  with  which    conditions  may  chf.nge  within  the 
industryi  that  is  a  large  number  of  sov/s  on  tts  farm  this  year, 
means  that  tnere  Ifi^probably  a  large  number  last  year.      It  is 
probable,  therefore,  that  the  value  of  rho  will  not  chajoge  materially 
since  it  is    tied  back  ratner  definitely  to  the  size  of  farm.  Aid 
this  latter  does  not  change  very  rapidly. 

The  fonmila  on  this  basis,  then,  wi  th  s  representing 
probable  percentage  error  becomes 

If  no^v  we  further  assume  that    w^n,  i.e.  that  the 
distribution  of  the  tabulated  records  is  made  proportional  to  the 
weighting  sche&ie,  thus  eliminating  the  necessity  of  weighting 
by  simple  averaging  of  all  records,  we  may  write 


i 


-11 


S  5 


K-^'^--^  )  ..,(7) 


or, 


a^U^r^:-*)  ...(s) 


or  ^  ^     All-  ...(9) 


where  N-Zh,  which  is  identical 


in  form  with  fonmxla 


8)  Detanainiiig  the  req.uisite  number  of  records  for  a  given  degree 
of  accuracy* 

The  final  ratio  for  the  U.  S.  is  rarely  interpreted  to 
a  finer  degree  than  five  per  cent.      This  is  perfectly  proper 
since  factors  operating  sulpseqaently  to  the  taMng  of  the  survey, 
such  as  hog  cholera,  for  exac^jle,  can  easily  cause  a  digression 
of  this  degree  from  what  the  ratio  would  indicate. 

Let  us  assume »  howavgr,  that  a  statistical  accuracy  of 
one  per.  cent  must  "be  secured.       Dividing  this  by  two — in  conformity 
with  the  augge  tion  luade  on  page  —gives  0,5^.    Now  the  chance 

of  an  average  being  more  inaccurate  than  four  times  its  probable 
error,  owing  to  the  probabilities  of  saji^jling,  is  but  one  in  one 
hundred.    If  the  probable  error  of  our  ratio  was  but  one  quarter 
of  0.5^    or  0,125^    it  is  a  practical  certainty  then  that  the 
fiml  ratio  would  be  within    one  per  cent  of  the  true  value.  Taking 
0,125  ^  the  desired  percentage  probable  error  we  must  secure  then, 
in  our  final  ratio,  we  read  from  the  curve  the  number  of  records 
necessary  to  sectirs  this  and  find  it  to  be  about  100,000. 

In  a  aimilar  ra  nnar  if  we  start  with  the  aaaun^jtion  that 
we  mast  secura  an  accuracy  assuredly  within  but  3%,  the  number  of 


—12 

records  required  Is  only  about  U,000 — a  rather  startling  coioparisoB 
30,000  records  assures  us  an  accuracy  of  about  one  and 

one-half  percent,,  (1,^^) 

A  proposition  deserving  consideration  la:    Is  the  Increas 

in  labor  for  tabulating  1^;  000  instead  of  5'J  000  justified  by  the 

increase  of  assured  statistical  accura(^  of  from  1.4%  to  1.0^? 


-13 

9)    Note  on  relationship  of  corralatlon  "betwean  x  &  y  In  the  sarnpla 
And  the  Correlation  Between  Averc>ges  ol  j    x    and    y     In  succaasive 
sagples. 

The  relation  between  correlation  of         and     y.  In 
the  aao^le  and  correlation  of  a  series  of  successive  saz&^les' 
averages  Is  thus: 

We  loay  first  generalize  from  the  correlation  within 


the  8 


ample,  ^^^^  i  *o         correlation  In  the  -universe,  ^ic^* 

A,       ~  71^  ...CIO) 
within  the  limits  of  fluctuation  of  saiopllng. 

And  In  llxe  manner,  as  our  "best  available  measures,  the 
standard  deviations  of  the  s&aiple  ars  taken  as  equal  to  the  standard 
deviations  of  the  universe,*^    > • 

The  regression  aquation  gives  the  most  probable  values 
of  X,  x' ,  associated  with  y,  assuiuiug  linear  relationsnip. 

or  letting        fl      ^  ^      y'=  ...(12) 

From  (ll) 

Thus  ^ 


Jfom  (12) 


...(15) 


Substituting  (15)  In  (l^J 

K^^-^  ...(16) 

liovr  In  any  randoa  sample  of  n  Itecas,  the  average  of  the 
y.  3,  m^^  will  be    ^y/n,  the  average  of  the  x,  s,  m^   ,  will  be 


2-x/ru    Frcsa  a  series  of  such  sajnples,  then,  we  would  secure  , 
and  .    The  measure  of  the  regression  cetween  the  avsragss 

would  be  \inchanged,  b  ,      for  since  each   x*;=  Tj'y,  each  'Sx*/n  =.  "b'Zy/ru 


Thus,  just  as  ^x*!' 


30 


^-^.-M^^  ^5;^  ..-(17) 

But  <y;7-  in  probability  )) 

if  )  ...(ig) 

and  <:rr    -  ) 


Substituting    in  (17), 

showing  the  identity  of  correlation  in  the  sao5)le  and  correlation 
of  sauples'  averages. 

The  reaao  ning  in  the  foregoing  showing  the  equality  of 
the  correlation  in  the  original  itaas  in  a  series     and  in  a  series 
of  averages  of  the  itesas,  is  based  on  the  assvia^tion  that  the 
relationship  between  the  variates  is  strictly  linear.     In  the 
event  that  a  ctirvili:iear  relation  exists  the  correlation  of  the 
avaragas  will  nearly  always  be  markedly  higher  than  the  correlation 
of  the  original  items,  as  may  be  reasoned  thus: 

(a)  There  is  practically,^  always  a  grouping  of  the 
observations  in  a  dot  chart  along  the  central  portion  of  the 
regression  line,  with  the  nimcer  of  cases  thinning  towards  the 
extremes.      If  curvilinear,  a  curve  may  be  drav/n  through  them. 

(b)  A  straight  line  will  usually  fit  the  central  portion 
of  the  curve  considerably  batter  than  it  will  fit  the  entire  length 
since  the  extremes  largely  define  the  cvirvilinear  relation. 


t 


1 


-15 

(c)  In  the  process  of  averaging  a  series  of  successive 
sao^les  taken  from  the  distribution  along  the  curve,  the  central 
portion  of  the  original  curve  is  given  a  great  deal  more  weight 
owing  to  the  concentration  of  i  teres  there,  than  are  the  extremes, 
and  thus  the  line  of  averages  so  secured  and  plotted  tends  to 
resenible  the  central  portion  of  the  original  curve,  to  which  a 
straight  line  may  be  fitted. 

(d)  Therefore,  when  dealing  with  data  whose  relation 
In  the  original  is  curvilinear,  the  correlation  of  the  averages- 
correlation  being  in  tenos  of  linear  measursment — tends  to  be 
greater  than  the  correlation  in  the  original  data. 

To  ob_viate     this  difficulty  in  estiaiating  correlation 
of  averages  in  a  series  of  successive  random  saoples  from  the 
correlation  of  the  unit  values  in  the  given  samrle,  we  could  pro- 
ceed as  follows: 

Sv^jposing  a  curviliaear  relationship— as  frequently  is 
the  case — we  n^y  eliminate  tiie  effect  of  this  in  reducing  correlation 
in  the  original  by  arbitrarily  curving  the  regression  line  on  the 
dot  chart  so  as  to  pass  through  the  greatest  number  of  dotsj  measuring 
the  standard  deviations  (error)  of  the  differences  around  this  line 
(in  the  direction  of  the  co-ordinate  representing  the  dependent),  s, 
and  coaaparing  with  the  original  standard  deviation  of  the  variable 
(dependent)  ,  7C  ,      gives  us  the  correlation  index    h/ 1/^-^*  (rho), 
by  the  formula  of  form  familiar  in  ordinary  correlation  methods. 


•  ..(3D) 


—16 


This  process  is  equlvalsnt  to  shifting  tne  dots  on  the 
chart  to  a  position  establishing  a  linear  relationship.     If  the 
dots  had  "been  so  moved  the  correlation  of  the  original  items,    '"'/(^  ' 
would  be  increased    froiu       ^ 'T^^   *       9<lii^l  y^'^  i(rho);  and 
also  the  correlation    rl^^    would  equal  the  correlation  of  the 
averages,  fi^    ,  since  the  curvi linearity  would  have  been  eliminated, 

^7- 


tne  reasoning  in  the  foregoing  portion  of  this  section  would  be 
applicable.      We  may  thus  taksy^^    instead  of    ^-^t^^  ^  better 
indication  of  ^;t:^«  is  basic  and  should  be  observed  in 

similar  gene  rali;*;  at  ions  of  correlation  of  averages  of  successive 
samples  from  correlation  of    variates  within  the  single  sac^le. 

Note:      Tnere  ara  two  correlation  indices  just  as  there 
are  two  correlation  ratios.    But  the  indices  have  a  greater  tendency 
to  coincide  in  magriitudes,  since  they  are  derived  from  "smoothed" 
functions  avoiding  the  irregularities  which  may  influence  the 
ratios.     As  a  rule  therfore,  if  approximations  are  desired,  it  is 
only  necessary  to  calculate  one. 

By  refer ing  to  pages  &  the  close  agreenent 

bet//een    r^  ^  xaix,  correlation  of  the  averages    and/^^  correlation 
ratio  in  the  original  items,  in  actual  practice  may  be  observed: 
.yb     and  .8  respectively. 


Pig  Survey. 

Graph  showing  the  relation  between 
Probable  Error  of  ths  Sows  farrov^d 
Ratio,  and  the  i:^ber  of  Records. 


COPIED 


JOB  NO 
KIND  OF 
WORK:  - 


NAME: 


Smith,  •Rrafi fnrd  "R, 


Ec753Si 


Size  of  santple  stiidy  with  particular  reference 


^^t^o  the  pig  sxirvey; 
r'vfaJLn  •  to  hs  r.^j 


Oct. 30.  10^, 


AN  ;  t 


7^ 


Cat,  koom 


G     O  8—2433 


\ 


